Large margin multinomial mixture model for text categorization
نویسندگان
چکیده
In this paper, we present a novel discriminative training method for multinomial mixture models (MMM) in text categorization based on the principle of large margin. Under some approximation and relaxation conditions, large margin estimation (LME) of MMMs can be formulated as linear programming (LP) problems, which can be efficiently and reliably solved by many general optimization tools even for very large models. The text categorization experiments on the standard RCV1 text corpus show that the LME method ofMMMs can largely improve classification accuracy over the traditional training method based on the EM algorithm. Comparing with the EMmethod, the proposed LMEmethod can achieve over 20% relative error reduction on three independent test sets of RCV1.
منابع مشابه
Unsupervised Feature-Rich Clustering
Unsupervised clustering of documents is challenging because documents can conceivably be divided across multiple dimensions. Motivated by prior work incorporating expressive features into unsupervised generative models, this paper presents an unsupervised model for categorizing textual data which is capable of utilizing arbitrary features over a large context. Utilizing locally normalized log-l...
متن کاملClustering Images with Multinomial Mixture Models
In this paper, we propose a method for image clustering using multinomial mixture models. The mixture of multinomial distributions, often called multinomial mixture, is a probabilistic model mainly used for text mining. The effectiveness of multinomial distribution for text mining originates from the fact that words can be regarded as independently generated in the first approximation. In this ...
متن کاملMultinomial Mixture Modelling for Bilingual Text Classification
Mixture modelling of class-conditional densities is a standard pattern classification technique. In text classification, the use of class-conditional multinomial mixtures can be seen as a generalisation of the Naive Bayes text classifier relaxing its (class-conditional feature) independence assumption. In this paper, we describe and compare several extensions of the class-conditional multinomia...
متن کاملDirichlet Mixtures in Text Modeling
Word rates in text vary according to global factors such as genre, topic, author, and expected readership (Church and Gale 1995). Models that summarize such global factors in text or at the document level, are called ‘text models.’ A finite mixture of Dirichlet distribution (Dirichlet Mixture or DM for short) was investigated as a new text model. When parameters of a multinomial are drawn from ...
متن کاملA Survey Paper On Naive Bayes Classifier For Multi-Feature Based Text Mining
Text mining is variance of a field called data mining. To make unstructured data workable by the computer Text mining is used which is also referred as “Text Analytics”. Text categorization, also called as topic spotting is the task of automatically classifies a set of documents into groups from a predefined set. Text classification is an essential application and research topic because of incr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008